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A cluster identification framework illustrated 
by a filtering model for earthquake 
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A general dynamical cluster identification framework including both modeling and computation 
is developed. The earthquake declustering problem is studied to demonstrate how this framework 
applies. 

A stochastic model is proposed for earthquake occurrences that considers the sequence of 
occurrences as composed of two parts: earthquake clusters and single earthquakes. We suggest 
that earthquake clusters contain a "mother quake" and her "offspring." Applying the filtering 
techniques, we use the solution of filtering equations as criteria for declustering. A procedure 
for calculating maximum likelihood estimations (MLE's) and the most likely cluster sequence is 
also presented. 

Keywords: earthquakes; filtering; Kushner-Stratonovich equations; marked point process; 
Zakai equations 

1. Introduction 

Suppose one observes a series of events Xi , X2 , • ■ • , Xn occurring at times ri , r2 , . . . , t„ . 
Each event is either "normal" or "abnormal." The objective is to identify those "abnor- 
mal" events. 

One application of this problem is in epidemiology. For instance, the patients with 
Severe Acute Respiratory Syndrome (SARS) have symptoms similar to those of common 
flu patients. However since SARS is much more infectious than common flu, the SARS 
patients often appear in clusters. Such statistical evidence enables us to identify the 
SARS patients by mathematical tools. It provides a supplementary method to the costly 
medical test. 

Another application is to collusion set detection. In a stock market, a group of traders 
forms a collusion set if they heavily trade among themselves in order to manipulate 
the stock price. It is of interest to catch this kind of malpractice as early as possible. 
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Considering each trade record as an event, it is intuitive that the mahcious trading events 
tend to cluster. Assuming that a distance measuring the dissimilarity between any two 
records is available, Palshikar and Apte tackle the problem via graph clustering in [12]. 
They ignore the time stamp on the trade record so that a point process is reduced to a 
graph. But the temporal information is lost in their method. 

These examples motivated our filtering model. We model the observations as a mixture 
of two independent marked point processes representing the "normal" and the "abnor- 
mal" events, respectively. Each new "abnormal" event will change the intensity of the 
"abnormal" point process. Typically the "abnormal" event increases the intensity for ad- 
ditional "abnormal" events in its neighborhood. Our goal is to compute the conditional 
probability of each observed event being abnormal in real time. Employing filtering tech- 
niques, we derive versions of the Zakai and Kushner-Stratonovich equations. Under a 
Markov condition, a sequential algorithm is presented to calculate the exact conditional 
probability that we are interested in. 

Unfortunately, the data set for the two examples above is not available. We will present 
our methodology in the context of the "earthquake declustering problem." Even though 
there is no agreement on the underlying mechanism of earthquake occurrence in the 
current seismology literature, we want to emphasize that this example is mainly for the 
purpose of illustration. Our framework is for general modeling and computation. It could 
be adapted for different data sets in various areas. 

It is well known that earthquakes often occur in clusters. The largest quake in a clus- 
ter is called the main shock, those before it arc called foreshocks, and those after it are 
called aftershocks. The aftershocks in an earthquake swarm are relatively easy to predict. 
However, there are also many earthquakes that strike without any foreshocks or after- 
shocks. As the authors stated in [8]: "To forecast the location of the large earthquakes, it 
is necessary to analyze the background seismicity, for which removal of temporal cluster 
members is considered to be of central importance." 

In this article, we propose a space-time point process model stemming from [14]. 
The observed earthquakes arc considered as a mixture of earthquake swarms (a swarm 
contains at least two quakes) and single quakes. This could be considered as a special 
case of the "cluster processes" ([2], Section 6.3): a cluster process is composed of clusters 
that contain only a single point and clusters that have multiple points; in our model, we 
distinguish the single point events (single quakes) as the "noise" and the multiple point 
events (earthquake swarms) as the "signal." The conditional probability that a quake is 
in a cluster becomes a natural criterion for declustering. The filtering theory hence can 
be applied. We assume that, at most, one cluster is active at a time. This assumption 
can be relaxed with increased computational complexity 

In the literature, inference for partially observed stochastic processes is often obtained 
by using Markov chain Monte Carlo (MCMC) methods (see, e.g., [6]). A particle algo- 
rithm is also proposed in [14]. Such approximation methods arc more flexible, but they 
are time-consuming and the approximation error is usually difficult to estimate. This 
paper deals with finding analytic solutions for some cases. 

The paper is organized as follows: Section 2 describes the generic model and the fil- 
tering equations; Section 3 presents the computational procedure for the conditional 
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expectation of interest under the "mother quake" assumption (the first quake in a clus- 
ter triggers all the other quakes in that cluster) ; Section 4 illustrates the numerical results 
for earthquakes in central and western Japan; Section 5 summarizes the conclusions and 
describes future work; Appendix A gives the algorithm that calculates the maximum like- 
lihood estimators of the parameters and the most likely cluster path; finally, the proofs 
are contained in Appendix B. 

2. The generic model 
2.1. Formulation of the model 

Suppose observed information about a quake is represented by a mark in a space E. 
For example, E could be R'^, recording the earthquake's magnitude and the epicenter's 
location longitude and latitude. We model observations as a marked point process O with 
marks in E. O is the mixture of two independent point processes and C, which stand 
for the single quakes and earthquake clusters, respectively. Hence letting 0{A,t) denote 
the number of quakes characterized by values in A (A is a subset of E) observed up to 
time t, we can write 



We assume that A^ is a Poisson process with intensity 7 relative to a reference measure 

hence the single quake model is just a Poisson random measure on E x [0, 00) with 
mean measure i'o{du x ds) = 7(w, s)i>{du) ds. 

We model clusters to be randomly initiated and assume they eventually die out; we 
also assume that there is at most one active cluster at a time as mentioned in Section 
1. Let D be the process that indicates whether a cluster is active or not. The process C 
adds a mark u at time s with non-negative predictable intensity X{u,s,Dg_,r]g_), where 
77 is the configuration of both the marks and occurrence times of all the previous cluster 
quakes. More precisely, if cluster quake q occurs at ti, then r]t = X]{r t <t} \ci,ti)i where 
(.) is the Dirac measure concentrated on the point (ci,ti). Therefore, 77 is a counting 
measure on E x [0, 00). 

When Z? = 0, there is no active cluster and an intensity A(ti, s, 0, rjs-) gives the rate at 
which a new cluster is initiated by an event with mark u at time s. Once initiated, the 
cluster grows with intensity A(u, s, 1, 7ys_) until it dies out. 

Under very mild conditions on the intensities (see [5]), the point processes can be 
written as solutions of stochastic differential equations. In particular, we can write 



0{A,t)^N{A,t) + C{A,t). 



0{A,t) N{A,t) + C{A,t) 
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where and ^2 are independent copies of a Poisson random measure on Ex [0, cxo) x 
[0, 00) with mean measure 1/ x i x £, denoting Lebesgue measure by £. 

In this article, we define D as follows: D is equal to 1 once a cluster is initiated; D has a 
probability p to die out (i.e., D = 0) whenever a new observation is added to the cluster; 
D is independent of all previous history. Thus, for an arbitrary function f(Dt,rit), 

f{Dt,r,t)= f l[D,_=o}[.ni,Vs-+Siu.s))-.f{0,Vs~)] 

JExlOM 

(2.1) 

+ = ~ Ic{E,s),Vs-+S^u,s}) - /(l,??s-)]C(du X ds), 

where the {/fe. A: = 1, 2, . . .} are independent Bernoulli random variables with parameter 
p that are also independent of N and C . This follows by writing the right-hand side as 
a finite sum where most terms cancel out. 

In practice, /{DtjTjt) contains information about Dt and r]f Statistical inferences can 
be drawn if we are able to compute the conditional expectation of / based on the obser- 
vations O. The rest of the paper mostly deals with how to realize such a computation 
for arbitrary /. 

It is worth noting that our whole problem is essentially discrete and finite, hence the 
measurability of functions is (and should be) of minor concern. As Dt is either or 1, 
and ijt can only take finitely many values as well (2" if there are n observations), thus 
the function domain of / is finite. Therefore, all the functions are measurable. 



2.2. The filtering equations 



We derive the filtering equation for the conditional distribution of 77 given observations 
of O using a reference measure approach. If {fl,T,Q) is a probability space and P is a 
second probability measure on !F given by dP = L dQ, then for any sub-u-algebra V C T 
and L^-random variable Z, 

E-[Z\V] ^'^^^1^1 



EQ[L\V] 



We are going to use a reference probability measure Q under which the observations 
have a relatively simple structure. In the following lemma. A'' and C are independent 
Poisson random measures under Q. We first introduce a definition that is used in the 
lemma. 



Definition 2.1. A Poisson process N is compatible with a filtration {J-t} if N is 
{J- 1\- adapted and N{t + ■) — N(t) is independent of {J-t} for every t>0. 

Lemma 2.2. On {fl,J-,Q), let N and C be independent Poisson random measures with 
mean measures vq[Au x ds) = 7(u, s)i'[Au) ds and i'i[Au x ds) = Xq{u, s)v((1u) ds, respec- 
tively; let D be a cadlag process independent of N. Assume all processes are compatible 
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with {J-t}. L is determined by solving 
L{t) = l+ [ ( ^("' ^'-P^-' '?^-) _ l^ L{s-)[C{du X ds) - Aq (u, s)v{Au) ds] (2.2) 

iijx[o,t]V Aq(u,s) y 

anrf assuming that L is a {Tt} -martingale. Let P satisfy dP\jr^ = L{t)dQ\jr^. Then P 



IS 



a probability measure and under P, for all A such that A(u, s, Dg, ris)i'{du) ds < cx3 

for each t>0, 

C{A, t)- A(u, s, D,,ris)iy{du) ds 

JAx[0,t] 

is a local martingale and N is independent of C and is a Poisson random measure with 
mean measure vq- 

Thus under P both N and C have the intensity described in Section 2.1. Hence Lemma 
2.2 gives the form of the Radon-Nikodym derivative (or the Hkehhood) of P with respect 
to Q. Our further computation then can be justified by the uniqueness of the martingale 
problem (sec [9] or [4], Chapter 4). The assumption that L is a {jFt}-martingalc is very 
mild. 

Remark 2.3. The following condition is sufficient to ensure that (2.2) is a well-posed 
equation and that i is a martingale. 

(Condition 1) i'(E)<oo,Xq(u,s) and f ' ^ — — ^ are uniformly bounded. 

Aq(u,s) 

The process D has finitely many jumps in bounded time intervals. Thus we can record 
the history of the process Z? by a counting measure ht = X){r t-<t} ^{Dt-,ti}] the sum is 
over those ti when D takes jumps. Hence it represents a path that has value Dt- in time 
interval [ti,ti^i). As in (2.1), let / be an arbitrary function on the two counting measures 
{hs,ris), and set 

q^{f,s) = EQ[f{hs,Vs)L{s)\J^^], (2.3) 

Wf,^ F^ffh .)\tO^ EQ[f{h,,rj,,)L{s)\T^] 0(/,s) 

7r(/,s)=i? [M.,^.)|^J = ^Q[^(^)|^o] = ^y (2-4) 

Since hf contains all the information on Dt, we can write = Dt{ht). Further, we abuse 
the notation a little and write X{u, s, Ds-{hs-),r]s-) = A(u, s, /ig-, f?s-)- We need this 
expression to simplify the notation in the following theorem and in the application in 
Section 4. 

Let a denote the indicator that specifies whether or not a cluster is currently active, 
that is, a{hs,r]s) = 1[Ds=i} ~ ^s- Let I — p and 

/now = [1 - «(■, •)]/(• + S{l,s),- + S{u.s)) 

(2.5) 

+ a{-,-)[pf{- + S(o,s)r + S(u,s)) + qf{- + + S{u,s))]- 
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Theorem 2.4. For an arbitrary function f on {hs,r]s), let (f>, tt and /new be defined as 
in equations (2.3)-(2.5). Then (p and tt satisfy the stochastic integral equations 

c^{f,t)^c^{f,Q)- f <Pifi-r)[Xiu,srr) - XQiu,s)],sMdu)ds 

J Ex[0,t] 



Ex[0,t] 



fncw^—, T /(■, s- -7—, — 7 rO[du X ds) 



and 

7r(/,i)=^(/,0) 

, f 7r(/nowA(w,s,-,-),s-)-7r(A(w,s,-,-),s-)7r(/,s-) , , 

+ / 777 ^ 7 ^ ^ U{du X ds) 

JiSx [o,t] 7r(A(ii, s, •) + l[u, s),s-) 

- / OAl", s, •, •), s) - 7r(/, s)7r(A(M, s, •, ■),s)}iy{du) ds. 

J Ex[0.t] 

In Section 4, we will take / as the indicator functions that indicate the status of ?/, so 
that T:[f,t) gives us the conditional probability that an observation is in the cluster. 



3. Solutions of the filtering equation 

Unlike the infinite-dimensional nonlinear filtering problem, the solution of which can only 
be approximated, the function space in our problem allows a natural finite decomposition 
since wc have a finite function domain, that is, all the possible combinations of each 
observed event being in a quake swarm or not. Thus the exact solution could be computed 
theoretically, but generally the computational load increases exponentially as the number 
of observations increases. That is not feasible for online updating. 

In this section and also in Appendix A, we assume that when a cluster is active, the 
cluster is assumed to be triggered by the first quake (mother quake); when no cluster 
is active, a new cluster will be initiated randomly with an intensity e. To be precise, 
suppose one observes Ui at time r^. Let yi = (ui,Ti) and the set of observations by time 
t be 0{t) = {yi,y2, ■■■,Vk-Tk<t< Tk+i}. Then we have 

k 

X{uXDt-,m-) ^ Dt^Y.^^^'^^^yMyi) + (1 - Dt-)e{u,t), (3.1) 
1=1 

where Oo{yi) = l^y. ^/jg mother quake in the currently active cluster} ^-^d 0o(?/i) is defined as 

if there is no active cluster at that time. We suppose that the functional form of 
\{u,t,yi) is known. For the application in Section 4, X{u,t,yi) is a Gaussian kernel (4.1) 
that docs not depend on t. Note that there is, at most, one 9o{yi) (i = 1, 2, . . . , A:) non-zero 
at any moment t. The simple fact that 0{){yi)da{yj) = if i ^ j makes finding an analytic 
solution possible (see the proof of the following theorems) . Formally, we can think of the 
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intensity A as a vector with component X{u,t^yi) at each "orthogonal" direction Oo{yi), 
i = 1,2, . . . ,k. With the help of this kind of "orthogonal decomposition" of the function 
space, the problem can be reduced to be of polynomial complexity. 

For simplicity, we also assume that there is no cluster active at time 0. Define a{y,t) = 
X{u,t,y)i/{du) ior y G E and e{t) = J^e{u,t)v{du). The following two theorems give 
the algorithm to compute 7r(/, f), / is an arbitrary function of Ds and rjs. Recall that 
a{Ds,iis) = l{n,=i} = Ds. 

Theorem 3.1. For Tk <t< r^+i 
where 

hk{t) ^ 



ta f \ \ TT 6lo(j/)a,Tfc+i-)(7fc+i +gAfc+i,0 ■ ^ , , . 

7r(t'o(2/OQ;,Tfc+i) = , Kk + 1, 

dk+i 

,^ , s . ELiMfc+ij -efe+i)'^(^o(2/j)a,Tfc+i-) + efc+i 

7r(6'o(yfc+i)a,rfc+i) = , 

«fc+i 

where ^k+i =^{yk+i,Tk+i), \k+i.i = \{yk+i,Tk+i,yi), £k+i = e{yk+i,Tk+i) and dk+i = 
Ej=i(Afc+ij -efc+i)7r(6'o(yj)a>Tfc+i-) +£fe+i +7fc+i- 

In Theorem 3.3, we can solve for 7T{6o{yi)a,t) as the first step in our algorithm. The 
task of computing 7r(/, t) for more general / is completed in the next theorem. Note that 
the solution Tr{6o{yi)a,t) is needed in (3.4). 

Theorem 3.2. For Tk<t< Tk+i, 

^(^o(y.)a/,i) =^(0o(y.Oa/,U-)e"^--"^''""^"'^'^'''6fe(t), (3.3) 

7r(/, t) satisfies 

^^^ = - E HOo{x)af,t)[a{x,t)~e{t)] 
xeY{t) 

+ 7r{f,t) E Tr{eo{x)a,t)[a{x,t)~e{t)], 



(3.4) 
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and 

T^{So{y)af,Tk+i~)jk+i+q>^k+i,in{f{l,- + 5y^_^,)0a{yj)a,Tk+i~) 



i<k+l, 



dk+1 

7r(/(l, ■ + Sy^^Jjl - a), Tk+i-)ek+i 
dk+i 

7r(/,Tfc+i-)7fc+i + 7r(/(l, • + (5yfc+i)(l - a)£fc+i, rfc+i-) 
ELi •^fe+i.j7i'(b/(0, • + Sy,^,) + qf{l, ■ + Sy^^,)]eoiyj)a, Tk+i-) 

where /(O, • + (5^,+ J = /(• + (5(o,rfc+i)' ' + '^^'=+1)' /(!'• + -^y^^+i) =/(' + • + (^a.+J. 
Combining Theorems 3.3 and 3.2, we can compute 7r(/, t) for an arbitrary / in real time. 

4. Application to an earthquake data set 

We use the same data set as in [8]: the earthquakes in the period of 1926-1995 in the 
rectangular area 34°-39°N and 131°-140°E with magnitudes greater than 4.0 and depths 
less than 100 km. 

We take 1/ to be the uniform measure on the rectangular region, j{u) = 7 and 

X{u,Dt,r]t) = i{Dt=i}J2^=i H'^,yiWo{yi) + l{i5t=o}e, where A(w,yi) is proportional to 
a bivariate normal density: 

A(w, y,) = Aexp(-||u - /2d)/(27td). (4.1) 

For our mother quake model, the maximum likelihood estimations (MLE's) are 7 = 
0.1070, A = 1.3274, e = 0.0126, d = 0.0070, p = 0.2035. The log-likelihood is log(L) = 
-21604. 

We are interested in 

-*-{!/ is a quake in a cluster} 

and 

d^ti'} ■) — l{a cluster is active at time t}- 

We compute TT{9{yi){-,-),T) for all observed yi according to Theorem 3.2, where T is 
the last moment of the year 1995. The results are compared with [8], where the au- 
thors declustered the observations by computing aftershock probabilities under an ETAS 
model. In Figure 1, plot (a) is the histogram of the aftershock probabilities as presented 
in [8] . We denote their aftershock probabilities as pi . Plot (b) shows the distribution of 
the conditional probabilities p2 in the mother quake model. We are pleased to see that 
our stochastic models give relatively deterministic answers. Around 95% of quakes have 
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Figure 1. Histograms of (a) aftersiiock probabilities under ETAS model and (b) conditional 
probabilities to be in the cluster process in mother quake model. 
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Figure 2. Histogram of P2 — pi- 



a probability of being in clusters that is either smaller than 0.1 or greater than 0.9, as 
can be seen in plot (b). 

Although both results have a bimodal shape, the one in [8] disagrees with our models 
for many individual quakes. This can be seen from Figure 2. The histogram presents the 
difference of these two probabilities. 
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Figure 3. Time-space plots of the 1500 likely clustered earthquakes under different models: 
(a) under ETAS model; (b) under the mother quake model. 

It seems that the data set supports our model more. We plot the earthquake clusters 
in each setting by removing quakes with a low probability of being in a cluster. The 
time-space plots in Figure 3 have 1500 quakes. The vertical axis represents time (unit in 
days). It is quite clear that the plots from our models have a stronger cluster pattern. 
The three-dimensional plots are available from http : //www . stat . nus . edu . sg/~stawz/ 
and can be rotated and viewed in different perspectives. 

We also can compute 7r(Z?ri(-, ■),T) to see the status of the cluster at different times. 
Under the mother quake assumption, Figure 4(a) gives us the conditional probability 
that the earthquake cluster is active. The answer is again quite distinct. Figure 4(b) 
shows that most conditional probabilities are either close to or 1. 

5. Discussion 

Assumption (3.1) is just an example. Another earthquake model called the "domino" 
model is given in [14], where we assume that the last quake triggers the next quake in 
the cluster. It turns out that the mother quake model is more likely in our data set 
by comparing their likelihood. Roughly speaking, as long as the conditional intensity A 



A cluster identification framework 



367 



(a) 

1 

0.9 
OS 
0.7 
0.6 
0.5 
0.4 
03 
0.2 
0.1 

°0 0.2 0.4 0.6 OS 1 1.2 1.4 1.6 1.8 2 

? 10' 

(b) 

2500 

2000 
1500 
1000 
500 

°0 OA 02 0.3 0.4 OTS 0^5 07 0-8 0.9 1 

Figure 4. Under the mother quake assumption: (a) the conditional probability that the cluster 
was alive vs. time; (b) histogram of the conditional probability. 

modeling the cluster only depends on a "small" portion of the history (in (3.1), it only 
depends on the last mother quake), we can adopt an "orthogonal decomposition" and 
find an algorithm to find the analytic solution. 

Thus assuming that, at most, one cluster is active at a time is not essential for our 
method. This simplified assumption prevents the presentation from getting more messy. 
We can similarly work out a decomposition if we assume that, at most, say, three clusters 
are active at a time. 

Our filter separates the data set into the cluster quakes and the single quakes. Further 
data analysis in [14] shows geophysical differences. In particular, the magnitude of the 
cluster quakes is significantly different from the single quakes. The mother quakes are 
also significantly bigger than the offspring quakes. Note that we did not incorporate the 
magnitudes of the earthquakes into the model. This surprising finding further supports 
our model. 
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The application to seismology is only a special case of the filtering approach to abnor- 
mal cluster identification proposed in [14]. Other possible applications include epidemi- 
ology, intrusion detection in network security, criminology and quality control. 

Appendix A: Likelihood and maximum likelihood 
estimators 

A.l. Likelihood 

Theorem A.l. Let v he a finite measure and, on {fl,J-',Q), let N and C he independent 
Poisson random measures with mean measures i'{du)ds; let D satisfy {2.1). Assume all 
processes are compatible with {J-t}. Define Ljv and Lc hy solving 



LN{t)^l+ I Y){u,s)-l]LN{s-)[N{duxds)-v{du)ds\, (A.l) 

J Ex[0,i\ 

Lc{t) = l+ [ [X{u,s,D,_,Tjs-)~l]Lc{s-)[Ciduxds)-i^idu)ds] (A.2) 

JExlO,t] 

and assume that they are {J- 1] -martingales. Let L ~ L^Lc- L will also be an {J-t}- 
martingale. Let P satisfy dP\^^ = L{t) dQ\]^^ . Then P is a prohahility measure and under 
P, for all A such that A(u, s, Ds,ris)i'{du) ds < oo for each t > 0, 



C{A,t)- \{u,s,Ds,7jsMdu)ds 

JAx[0,t] 

is a local martingale and N is independent of C and is a Poisson random measure with 
mean measure '){u,s)v{du)ds. 

Remark A.2. The L derived from the theorem is the likelihood of our observation, 
which is the mixture of two processes. It is necessary to have it for the estimation of 
parameters. While in Lemma 2.2, the simplified version (2.2) is sufficient for proving 
Theorem 2.4, since it only concerns f{Dt,rit), which does not involve the process TV. By 
applying L derived here, we can prove a more general form of Theorem 2.4 so that we 
can have filtering equations about f{Nt,Dt,r]t). We omit it because the notation gets 
worse and we do not use it in our application. 

Our goal is to compute E'^[L\T'^], the likelihood in our model. We can first solve (A.l) 
and (A.2): 

Ljv(t)=cxp< / log7(u, s)A(du X ds) 
lJExlo,t] 

[7(u,s) - l]i^(du)ds L 

Ex[0,t] ) 
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Lc{t)^cxp< / \og\{u,s,Ds-,r]s-)C{du X ds) 



[A(w,s,-Ds_,77s_) - l]i/(dw)ds 

ExlO,t] 

Let 

CiA,t)^ f poiE,s){^)Oiduxds), (A.3) 

JAx[0,t] 

where p is the indicator of whether the observation is a cluster point. Under reference 
measure Q, pi,p2, ■ • ■ are i.i.d. BcrnouUi(l/2), and 

L{t) = LN{t)Lcit) 

= exp< / [{1 ~ po{E,s))iog^{u,s) + po{E,s)^og\{u,s,Ds-,'ns-)]0{du X ds) 

iJEx[0,t] 

[X{u, s, As-, ??s-) + 7(^7 s) - 2]i'(dw) ds 

ExlO,t] 

(xexp<j / [{l~po(E.s))^og-fiu,s)+po(E,s)^ogX{u,s,Ds^,ris^)]0{duxds) 

lEx[0,t] 

{X{u,s,Ds-,r]s^) +^{u,s))v{du)ds 

ExlO,t] 

We use "oc" since we ignore a constant, which has no impact on hkehhood inference. 

Recah 7j = 7(?/j,Tj), Xj,i = X{yj,Tj,yi), ej = £{yj,Tj). Let tq = and suppose one 
observes Ui at time r^, yi — {ui,Ti). 



(A.4) 



L(t„) oc exp< — / 7(u, s)i^(dit) ds 

I JexIO,t„] 

xT\jl^'''X{ui,n,Dr^^,rir^-Y'cxpl- / X{u,s,Ds^,r]s^)v{du)ds\ 

t=l I JiSx[T,_.i,r,] J 

Note that, conditional on observations O, all the possible partitions of O ~ C + N are 
equally likely under Q. The conditional distribution of L^Lq is completely determined by 
{pi} and {/fc}. {Ik} is independent of {pi} since it is independent of N and C. Hence the 
computation of E'^[L\!F'^] is reduced to calculating the expectation of L{{pi}, {Ik}) with 
i.i.d. Bernoulli (1/2) {pi} and i.i.d. Bernoulli(p) {Ik}- This expectation can be expressed 
as a weighted sum over all possible values of {pi}, {Ik}- 
We recall the special form of X{u,t, Dt,r]t): 

i-l 

X{ui,Ti,Dri^,riri-) = Dr,^^Xij9Qi{yj) + (1 - Dri-)£i, 
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where Ooi{yj) — '^{y. is the latest mother quake in the cluster at time n-} ^ij is defined aS 

in Theorem 3.3. Let Ei be the index of the latest mother quake at time Ti — , then 

X{ui,Ti,Dri^,riTi~) = Dr^^Xi^E, + (1 - £'r.-)ei- 

Assmne no cluster is active at time 0, hence Dq = 0. Therefore, 

L{ti) (xjl~''^ei^ exp< — / [e{u,s) + j{u,s)]v{du)ds>. 

I Jex[Q,ti] ) 

Sum over two terms corresponding to pi = and pi = 1, respectively, and we have 

eQ[L{ti)\J^"] oc exp(- / [e{u,s) +j{u,s)]iy{du)ds\. 

2 I J£;x[o,Ti] J 



A. 2. The forward algorithm and MLE 



It is not practical to sum over all the terms by brute force since the number of terms 
increases exponentially with respect to the number of observations. However, wc can re- 
duce the complexity to O(n^) by computing expl/^^jg ^ j ^{u, s)i'{du) ds}E^[L{Tn)\J-'^] 
recursively. We recall that when a single quake is observed at time Ti it has no impact 
on the cluster quakes iJi+i ~ Ei and Z)^. = i^r;-, while when a cluster quake is observed 
at time Ti, there are two possible scenarios: The first case is D^- = 0, so the observation 
at time Ti is a new mother quake 0^ = 1 and Ei^i = i. The second case is D^-.- = 1, so 
the observation is an offspring quake; hence, E'i+i = Ei. This observation kills the cluster 
with probability p. In other words, we look at a Bernoulli(p) random variable li, which 
is independent of everything else. If = 1, = 1; otherwise = 0. 
For < « < j, 

EQ[L{T,)^D.^^„,^^^^^y\T^] 

= EQ[L{T,)l^D.^_^^o^^^^^}Mp,=o}\^'']+E'^iL{T,)l[D^^^^ 



^(■^i-i)7jcxp<^ - / [e(w,.s) + 7(w,s)]i/(du) 

I J£;x[rj_i,rj] 



ds 



1{I).^_1=0,E^.=,}1{P,=0}|-^^ 



E^ 



i(r,_i)Aj,jexp<^ - / 



(A(u, s, lit) + 7(u, s))iy{du) ds 



:7jexp<— / {e{u, s) + ^{u, s))i'{du) ds 

I ExlTj-l.Tj] 
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+ Aj,iexp<^- / {X{u,s,yi) +"f{u,s))iy{du)ds 

xi?Q[i(r,_i)l{i..^._,.,.^..}l{P,=i}lU.=i}l-^°] 



:exp-^ — / '-f{u,s)i'{du)ds 

/_Ex[rj_i,Tj] 



X i Ij exp 

+ pXj^i exp 



£(u, s)iy{du) ds 



BX [Ti_i,T,- 



A(w, s, yi)z^(du) ds 



/£;x[Tj_i,rj] 



The second equality utilizes the fact that the switch Ij is triggered only when a cluster 
point is observed. The last equality uses the fact that Ij is an independent Bernoulli 
variable with parameter p. Since pj is an independent Bernoulli variable with parameter 
1/2, we have: 



exp( / j{u,sHdu)ds]EQ[L{T,)lD^^__„,.A\T''] 

U£;x[r,_i,r,l J I 3 J + i 



1 



:7j exp 



Bx [tj-i.Tj] 



e{u,sHdu)ds^EQ[L{T,^,)lD^^^_^^^^^^^^^\T^ 



1 



-pXj,i exp 



We ignore the constant factor 1/2 in the algorithm. 

This raises our forward algorithm. Let lj{d,i) (x E^[L(Tj)l^o^,^a e j_=i}\-^'^]- where 
d = 0, 1; J = 1, 2, . . . ; i = 0, . . . ,j — l,j . (i = indicates that there is no cluster point.) 

;i(0,0) = 7iexp{-/^^[o^^^]e(?/,s)i^(dM)ds}, ;i(l,0) = /i(0,l) = 0, /i(l,l) = ei x 
exp{— /^x|Q e(w, s)h'{du) ds}. Furthermore, 



7j exp 



-pXj^iCxpl 



e{u, s)i'{du) ds >/j_i (0, i) 



Ex[tj-1,Tj] 



X{u,s, j/i)i^(du) ds Mj_i(l, i), < i < j. 



(A.5) 



i = j. 
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J J exp 



X{u,s,yi)i'{du)ds >lj^i{l,i 



+ lj-i{l,i)qXj,iCxp< - / \{u,s,yi)iy{du)ds)-, 

L J_Ex[tj_i,Tj] 

Zj_i(0, /c)ej cxp< — / e(u, s)i^(dw) ds >, 



^ * < (A.6) 



The likelihood until time tj is = cxp{— J^^ ^ j 7(w, s);y(dw) ds} ^:^^q ^|^^q Zj(c?, i). 
Even when there is a moderate number of observations, the scale of the likelihood often 
exceeds what a computer can handle. Therefore, we compute the log(ij) instead. The 
trick is normalizing Ij at each step, thus we have the forward algorithm: 

1. Computing the normalizing constant: cj-i = X]i=o X]d=o 'i-i ('^' 

2. Normalization: lj-i(d,i) = lj-i{d,i)/cj-i. 

3. Updating lj{d,i) according to (A. 5) and (A.6). 

4. log(i) = log(cj) - Jex[o,t„] s)iy{du) ds. 

From the discussion above, we can find the likelihood for any specific set of parameters. 
Looking for the MLE hence is a standard optimization problem. It turns out that a non- 
derivative method works better in the example of this paper. In particular, we use the 
Nelder-Mead simplex method to search for the MLE (see [11]). 

The asymptotic confidence intervals for the parameters can be constructed. Observe 
that we have a hidden Markov model (HMM), essentially. Hence the theorems of asymp- 
totic normality of the MLE for a general HMM should apply (see [1, 3, 10]). Consequently, 
there are corresponding likelihood-ratio tests for the HMM as established in [7]. The 
asymptotic confidence intervals can then be constructed by inverting the test statistics 
(see [14]). 



A. 3. The most likely cluster sequence 



We borrow the Viterbi algorithm from HMM literature to compute the most likely cluster 
sequence in our setting. 

Let l*{d,i) be the maximum likelihood of all cluster sequences with D^-^ = d and yi as 
the latest mother quake. As in Section A.l, 



/^(0,0)=7iexp - / s{u,s)uidu)dsl Z*(l, 0) = ZJ(0, 1) = 0, 

I J_Bx[0,ri] 

/^(l, 1) = El cxp< — / e{u,s)iy{du)ds 

I J_Ex[0,ri] 
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7j exp 



-Ex[Tj_i,rj] 

max'^ pXj,i cxp 



e(w, s)iy{du) dsjl*_i{0, i), 
X{u, s, yi)iy{du) ds 



J J exp 

ro, 

max'|7j cxp 

qXj,i exp 
inax< Ej exp 



£;x[Tj_i,Tj 

e(M, s)h'{du) ds 



BxIt.-i.t/ 



£;x[T,_i,r,-; 



A(w,s,j/j)z^(du) ds 



-Ex[rj_i,Tj] 



X{u,s,yi)iy{du) ds >l*_j^{l,i) 



e{u,s)v{du) ds >l*,^{0,k) 



-Bx[Tj_i,Tj] 

fc = 0,l,...,i-ll, 



i = 0, 



(A.7) 



1 < * < j, 



1 < j < i 



(A.8) 



i=j. 



So the procedure to find the most Ukcly cluster sequence starts from the calculation of 
l*{d,i), using recursion in (A.7) and (A.8) while always keeping a record of the "winning 
sequence" in the maximum finding operation. Finally the last state {d, i)* is found where 

{d,t)* = aTg max l*„{d,i) (A.9) 

a— 0,l;0<2<n 

and, starting from this state, the sequence is recovered by backtracking. As before, nor- 
malization is necessary in each step of the recursion to prevent them from degenerating 
to or infinity. 



Appendix B: Proofs 



Proof of Lemma 2.2. 

By Theorem III.20 of Protter [13] 



M(A,t) = C(A,t) - / XQ{u,s)i^{du)ds 

JAx[Q,t] 



1 f X{u,s,Ds-,ih-) -, 1 w ^ , . 

— 1 \ L[s—)C[du X ds) 



Axlo,t]L{s)\ Xq{u,s) 



■C{A,t) - / XQ{u,s)iy{du)ds 

htlllEf^lIhA^MhAcidu X ds) 

Ax[0,t] X[U,S,Ds-,7]s-) 
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is a local martingale, and hence 

X{u,s,Ds-,ris-) 



is as well. 



M {du X ds) = C{A, t)- I X{u,s,Ds-, 77^- )^(dM) ds 

JAx[0,t] 



□ 



Proof of Theorem 2.4. To simplify the notation, we use f{0,ris- .j-)) to denote 

/(• +'5(o.s),'7s- +S(u,s)) and f{l,f]s- to denote f{- + S^i^s),Vs- +S{u,s))- 

Noting that 



J Ex[0,t] 



+ '^{D,^=l}[fii^ - Ic{E.s))Ds-,ris- + S(^u,s)) - /(^.s-,'?s-)]} 

' A(m,s,?7^^ 
Aq(m,s) 



1 L(s-)C(du X ds), 



Jo 



+ / L{s-)dfo{h{s),7^{s)) + [f,L]t 
Jo 

/(^o,%)+/ f{hs-,7]s-)dL{s) 



Bx [0,t] 



X{u,s,r]s 



\q{u,s) 



-L{s-)C{du X ds) 



fiho,m)+ / fihs-,ils^) 
JEx[a,t] 



X{u,s,hs-,ijs^) 



- 1 



L(s-) 



Bx [0,t] 



Aq(u,s) 

X [C{du X ds) - Aq(u, s)i^(dit) ds] 

{l{D,_=0}[/(l,'/s- +^(«,s)) - f{hs-,Vs~)] 

+ I{r.,_=l}[/((1 - Ic(E.s))Ds-,Vs- +^{u,s)) - /(^s-,'7s-)]} 
A(u,S,/ls_,77s-) 



Ag(M,s) 



-i(s-)C(dM X ds) 
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/(^o,%)- / f{hs,ris)[X{u,s,hs,T]s)~XQ{u,s)]L{s-)iy{du)ds 

JEx[Q,t] 



J Ex[aj] 

+ 1{Z3.,_=1}/(1 - Ic(E.s).Vs- + 5(u^s))\ Xq{u,s) 

- f{hs-,r]s-)}L{s-)po(E.s){u)0{du x d.s), 

where 

C{A,t)= / po{E.s){u)0{duy.ds) 

J Ay.[0,t] 

and under the reference measure, the {pfc(-), A: = 1, 2, . . .} are independent with Q{pk{u) = 
1} = 1 — Q{pk{u) = 0} = Aq(m, s)/[Ag(u, s) + 7(w, s)] and are independent of O. 

Averaging out the random variables that are independent of O under Q, the equation 
for the unnormahzcd conditional distribution becomes 



^(/,i)=0(/,O)- / (^{f{-,-)[\{u,s,;-)-XQ{u,s)ls)u{du)ds 

JBx [O.t] 

+ </> /new-;— 7 ^ - f{-r),s- ^ — ;^ -O du X ds . 

Applying Ito's formula, 
^(/,0-7r(/,0) 

, /■ 7r(/„cwA(u,s,-,-)>s-) -7r(A(M,s,-,-)'S-)i'(/.s-)^/ , ,n 

+ / 7T7 ^ 7 ^ ^ 0[du X ds) 

JExlo.t] 7r(A(w,s,-,-)+7(u,s),s-) 

""(/(■^ ■)^('") s) - 7r(/, s)7r(A(M, s, ■, ■), s)j^(dM) ds. □ 

£;x[o,t] 

Proof of Theorem 3.3. We apply Theorem 2.4. For Tk <t < Tk+i, 
n{eo{y)a,t) =TT{9o{y)a,Tk) - / n{eo{y)aX{u,s,h,r]),s)i^{du)ds 

JEy.[Tk,t] 

iT{9o{y)a, s)7r(A(w, s, h.rf)^ s)v{du) ds 

-Ex[Tfc,t] 



^ 7r(^'o(y)a,Tfc) - 

7r(6'o(y)a,s) 

-Ex[Tfc,t] 



/ TT >^ 6'o(2/)6'o(a;)aA(u, s, x), s J zy(du) ds 
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TT{9Q{x)a{X{u, s, x) — e{u, s)),s) + e{u, s) 



Ueo(s) 



v{du) ds 



TT{9o{y)a,Tk) - / ■7T{9o{y)eo{x)a,s) / X{u,s,x)iy{du)ds 

JiTk-.t] Je 



lrk,t] 



n{do{y)a,s) 



7r(6'o(a;)a, s) / X{u, s,x) — e{u, s)i'{du) + e{s) 



.xGO(s) 



ds 



:7r(6'o(j/)a,rfe) - / 7r(6'o(2/)a, s)a(j/, s)l{j^go(TO} 

[r,.,t] 



+ / 7r(6'o(y)a,s) 



7r(0o(a;)a, s)(a(a;, s) - e(s)) + e(s) 



ds. 



We use the fact that a = X^ajgoCs) ^o(2;)a to get the second equahty. Thus, for 



1, 2, . . . , fc, 

d^(go(y^),t) 
dt 



-a(y„t) + e(s) + ^[a(yj,t)-e(s)]40o(yj),O U(^o(2/z), 0, (B.l) 



(3.2 ) is the unique solution of this system of ordinary differential equations. 
At time t^+i, 

7r(6'o(y)a,rfe+i) = 71(6*0 (2/)a,Tfe+i-) 

^ 7r([l - «(■, ■)][6>o(t/)(l, ■ + (?(t,,s))]A(7/fc+i, Tfc+i, -, ■), Tfc+i-) 

^ 7r(a(-, ■)g[(6lo(t/)Q;)(l, ■ + l>(u,s))\X{yk+\,Tu+\, ■, ■), Tk+i-) 

dk+i 

Tr{X{yk+i,Tk+i,-,-),Tk+i-)Tr{9o{y)a,Tk+i-) 



7r(6'o(y)a,Tfc+i-)7fc+i 
dk+i 

_^ 7r([l - a(-, ■)] [6*0(7/) (1, ■ + 5(u^s))]X{yk+i,Tk+i,-, ■),Tfc+i-) 

_^ 7r(Q(-, ■)g[(6'o(y)a)(l, ■ + (5(„^s))]A(2/fc+i, U.+i, ■, ■), Tk+i-) 

dk+i 
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For i<k + l, 

7r(t'o(2/i)a,rfc+i) = . 

For i^k+1, 

,^ , . s J2l=ii<l^k+i.j -ek+i)n{eoiyj)a,Tk+i-)+ek+i 
'^(()o{yk+i)a,Tk+i) = . □ 

Proof of Theorem 3.2. For Tk<t< Tk+i, 

7r(6'o(2/)Q:/,i) = 7r(6'o(y)a/,Tfe) - / 7r{eo{y)af\{u,s,h,rj),s)i^{du)ds 

JEx[TkA 

+ / T:{9o{y)af,s)TT{\{u,s,h,vi),s)v{du)ds 

JEx[Tk,t] 

= T^{Ooiy)af,Tk) - TT ^ eo{y)9oix)af\{u,s,x),s\v{du)ds 

+ / 7ri9oiy)af,s) 

X < 7T{0o{x)a[X{u,s,x) — e{u,s)],s) + e{u,s)>i^{du)ds 

Ueo(s) J 

= ■^{Ooiy)af,Tk) ~ ^ n{eo{y)9o{x)af,s) \{u,s,x)v{du)ds 

7^(^o(?/)a/,s) 

7r(6'o(a:)Q;, s) / \{u, s,x) — £{u, s)h'{du) + e{s) > ds 



[r.,t] 



X 



■^{Soiy)af,Tk) - / 7r(6lo(?/)a/,s)a(?/,s)l{j^go(rfc)}ds 
7T{0o{y)af,s) 

[r.,t] 



X 



7r(6'o(x)a, s)[a{x, s) — e(s)] + e{s) > ds. 

x6y(rfc) J 
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dt 



-a(y„f) +^[a(yj,t) -£(s)]^(0o(%),t) \7^{eo{y^)af,t), (B.2) 

1 = 1,2,.. .,fc 



Comparing (B.2) and (B.l), wc conclude: 

■K(eo[y)af, s) = — — ^ 

and (3.3) follows easily. 

,a f \ -F \ la ( \ f \ 7r(A(yfc+i,Tfc+i,-,-),T-fc+i-)7r(6'o(y)a/,TA;+i-) 



^([l-a(v)][(go(y)/)(l,- + (5(„,,))]A(;/fc+i,r,.+i,.,.),rfc+i~) 



+ 



7r(a(-, ■)g[(6lo(7/)/)(l, ■ + (5(„^^))]A(yfc+i, r^+i, ■, ■),rfc+i-) 

_ [1 - 5y^^^{y)]'K{eQ{y)af,Tk+i-)-lk+i 
dk+i 

^ [1 - V+i(;/)]gEj=i Afc+i.j7r(/(l, ■ + 5y^^,)eo{yj)a,Tk+v 

dk+i 

_^ '^i/fe+i(y)7r(/(l, ■ + '?^;fc+i)(l - a), Tk+i-)ek+i 
dk+i 



7r(/,Tfc+i-) 
= 7r(/,rfc) 



/Bx[Tfc,Tfc+i] 

^7r(/,Tfe) 



f 7r(/A(u, s, /i, 77), s)z^(du) di 

£;x[Tfc,rfc + i] 

7r(/, s)7r(A(u, s, ft,, 77), s)i/{du) ds 



-I 



-Ex[Tfc,rfc+i] 



IEx[Tk.Tk+ 

7r(/,s) 



TT j 0o(^)Q!/-^("5 2;) + (1 — Q!)e(w, s)/, s ] j/(du) ds 

il \x£0{s) J 



7r(6'o(a;)aA(u, s, x), s) + (1 — Q!)e(M, s) 
Ueo(s) 



j^(du) ds 
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= 7r(/,rfc)-/ ^ n{eo{x)af,s)[a{x,s)-e{s)]ds 

T^ifis) ^ TT{9o{x)a,s)[a{x,s) - e{s)]ds, 

l-rk.Tk + l] xeO(Tfc) 

where a(x,s) = A(it, s, x)i^(dit), e(s) = J^e(u,s)i/(du). Hence, 
= _ ^ 7r(0o(x)a/,t)Kz,i)-e(s)] 



^(/,t)J ^ 7r(0o(x)a,i)[a(x,t)-£(s)] i, 

IxeY(rk) J 

7r(A(2/fc+i , Tfc+i ,-,•), Tk+i-)TT{af, r^+i -) 



_l_ 7^(./'ncwA(yfc+i,rfc+i, ■, ■),rfc+i-) 

7r(/,rfc+i-)7fc+i +7r(/(l,- + -Q;)£fc+i,rfc+i-) 
, ELi -^/c+ij7r([p/(0, • + 5y,^,) + qf{l, ■ + (5y,+i)]6'o(%)a, r^+i-) 



□ 



Proof of Theorem A.l. Apply Lemma 2.2 and note the independence of C and N. □ 
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